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Oce-Technologies B.V., of Venlo 
Segmenting an image via a graph. 

5 

The invention relates to a method of segmenting a composite image of pixels 
into a number of fields corresponding to lay-out elements of the image, the pixels 
10 having a value representing the intensity and/or color of a picture element 

The invention further relates to a device implementing the method, which 
device.comprises an input unit for inputting an imagerand a processing unit. 

Several methods for segmenting a composite image, such as a document 
including text and figures, to identify fields corresponding to layout elements f are 
1 5 known in the art, and a common approach is based on processing the background. 
The image is represented by pixels that have a value representing the intensity 
and/or color of a picture element. The value is classified as background (usually 
white) or foreground (usually black, being printed space). The white background 
space that surrounds the printed regions on a page is analyzed. 

20 

A method for page segmentation is known from the article "Image 
Segmentation by Shape-Directed Covers" by H.S. Baird et.al. in "Proceedings 10 th 
International Conference on Pattern Recognition, Atlantic City, NY, June 1990, pp. 
820-825". In an image to be analyzed, a set of maximal rectangles of background 

25 pixels is constructed, a maximal rectangle being a rectangle that cannot be enlarged 
without including a foreground pixel. Segmentation of the image into information- 
bearing fields, i.e. text columns, is achieved by covering the total image with a 
reduced set of the maxima! rectangles. The remaining 'uncovered 1 area is considered 
foreground and may be used for further analysis. A problem of the method is that the 

30 fields are defined as areas in the pixel domain, which does not allow computationally 
efficient further processing. 

Further a method for page segmentation is known from the article "Flexible 
page segmentation using the background' by A. Antonacopoulos and R.T Ritchings 
in "Proceedings 12 m International Conference on Pattern Recognition, Jerusalem, 

35 Israel, October 9-12, IEEE-CS Press, 1994, vo!2, pp. 339-344". The background 
white space Is covered with tiles, i.e. non-overlapping areas of background pixels. 
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The contour of a foreground field in the image fs identified by tracing along the white 
tiles that encircle it, such that the inner borders of the tiles constitute the border of a 
field for further analysis. A problem of the m thod is that the borders of the fields are 
represented by a complex description which frustrates an efficient further analysis, 

5 

It is an object of the invention to provide a method and device for segmenting 
an image which is more efficient and in particular delivers a simple description of the 
segmented image that can easily be used in further processing steps. 

According to a first aspect of the invention the object is achieved by providing 
10 a method of segmenting an image of pixels into a number of fields, comprising; 

- constructing separating elements corresponding to rectangular areas of adjacent 
pixels having a background property indicative of a background of the image, 

- constructing a graph representing the lay-out structure of the image by defining 
vertices of the graph on the basis of intersections of separating elements that are 

1 5 substantially oriented in predetermined separation directions, in particular horizontal 
and vertical direction, and defining edges of the graph between the nodes 
corresponding to the field separators, 

- defining field separators corresponding to the edges of the graph. 

20 According to a second aspect of the invention the object is achieved with a 

device for segmenting an image of pixels into a number of fields corresponding to 
lay-out elements of the image, the pixels having a value representing the intensity 
and/or color of a picture element which device comprises 

- an input unit for inputting an image, and 

25 - a processing unit for constructing a graph representing the lay-out structure of the 
image by 

- constructing separating elements corresponding to rectangular areas of adjacent 
pixels having a background property indicative of a background of the image, 

- defining vertices of the graph based on intersections of separating elements that 
30 are substantially oriented in different separation directions, in particular horizontal 

and vertical direction, and 

- defining edges of the graph between the vertices corresponding to the separating 
elements. 

35 According to a third aspect of the invention the object is achieved with a 

computer program product for performing the method. 
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The advantage of constructing the graph is that the edges provide a compact 
and efficient representation of the borders of the fields. Further analysis of the fields 
based on the graph is computationally efficient •* — - 

The invention is also based on the following recognition. A graph 
5 representation has been proposed but rejected as being too complex in 

segmentation in the article by A. Antonacopoulos as described above. The inventors 
have seen that the graph of Antonacopoulos Is not representing the fields at all, but 
only provides a representation of the background tiles in the image and their 
adjacency. The graph constructed according to the invention, however, directly 

1 0 covers the fields based on the structure of the background, and therefore provides a 
representation on a high level of the fields in the layout of the image. 

In an embodiment the step of defining vertices comprises constructing 
subsets of separating elements that are substantially oriented in the predetermined 
separation directions, and determining the intersections between pairs of separating 

1 5 elements from both subsets. This has the advantage that the vertices in the graph 
are constructed in an efficient way. 

In a further embodiment the method comprises constructing a set of maximal 
rectangles, a maximal rectangle being a rectangular part of the image in one of the 
separation directions, that has the maximum possible area without including a pixel 

20 not having the background property indicative of a background of the image, and 
constructing the separating elements In a cleaning step wherein at least one pair of 
overlapping maximal rectangles in the set is replaced by an informative rectangle that 
is a rectangular part of an area combining the areas of the pair, which rectangular 
part has the maximum possible length in the relevant separation direction. 

25 This has the effect, that separating elements that are long and narrow along a 

separation direction are constructed efficiently. The advantage Is that separating 
elements most informative for separating fields are constructed and fields enclosed 
by the separating elements are detected easily. Although initially a large number of 
maximal rectangles are found the cleaning step efficiently reduces said number so 

30 that a computationally efficient procedure for construction of the separating elements 
is* possible. - * « .*« *#•-•• %- ».*» ^ ♦ «.* 

In an embodiment of the method, prior to said constructing the maximal 
rectangles, the image is filtered by detecting foreground separator elements that are 
objects in the foreground of the image having a pattern of pixel values deviating from 

35 said background property, in particular black lines or dashed or dotted lines, and by 
replacing pixels of the detected foreground s parators by pixels having the 
background property. Th effect of replacing foreground separators by the 
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background color is that larger and more relevant areas of background are formed. 
The advantage is that larger background areas are present and without additional 
computational steps. This results in larger maximal rectangles, which improves the 
quality of the resulting segmentation. 

Further preferred embodiments of the device according to the invention are 
given In the further claims. 



10 



15 



20 



25 



30 



35 



These and other aspects of the Invention will be apparent from and elucidated 
further with reference to the embodiments described by way of example In the 
following description and with reference to the accompanying drawings, in which 
Figure 1 shows an overview of an exemplary segmentation method, 
Figure 2 shows a part of a sample Japanese newspaper, 
Figure 3 shows the merging of objects along a single direction, 
Figure 4 shows segmentation and two directional merging of objects, 
Figure 5 shows construction of a maximal rectangle from white runs, 
Figure 6 shows construction of maximal white rectangles, 
Figure 7 shows cleaning of overlapping maximal white rectangles, 
Figure 8 shows a graph on a newspaper page, 
Figure 9 shows two types of intersection of maximal rectangles, and 
Figure 1 0 shows a device for segmenting a picture. 

The Figures are diagrammatic and not drawn to scale. In the Figures, elements 
which correspond to elements already described have the same reference numerals. 

Figure 1 shows an overview of an exemplary segmentation method, showing 
three basic steps from known segmentation systems. The input image 1 1 1s 
processed in a CCA module 14 that analyses the pixels of the image using 
Connected Component Analysis. First an original picture that may be a black-and- 
white, grayscale or coloured document, e.g. a newspaper page, is scanned, 
preferably in gray scale. Grayscale scanned pictures are halftoned for assigning a 
foreground value (e.g, black) or a background value (e.g. white) to each pixel. The 
CCA module 14 finds foreground elements in the image by detecting connected 
components <CC) of adjacent pixels having similar properties. An example of the first 
steps in the segmentation process are for instance described in US 5,856,877. The 
CCA module produces as output CC Objects 12, that are connected components of 
connected foreground pixels. An LA module 15 receives the CC Objects 12 as input 
and produces Layout Obj cts 13 by merging and grouping the CC Objects to form : 
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,arg rlayout objects such as text lines and text blocks. During this phase, heu^cs 
are used to group iayout elements to form larger iayout elements. Th fS ,s a .ogtaal 

13 as input and produces Articles 17 as output by article formation. In this module, 
5 severai layout objects that constitute a larger entity are grouped together. The larger 
entity is assembled using layout rules that apply to the original picture. For example 
in a newspaper page the AF module groups the text blocks and graphical elements 
like pictures to form the separate articles, according to the layout rules of that s P e«f.c 
newspaper style. Knowledge of the layout type of the image, e.g. Western type 
10 magazine, Scientific text or Japanese article layouts, can be used for a rule-based 
approach of article formation resulting in an improved grouping of text blocks. 

According to-the invention additional steps are added to the segmentation as 
described below. The steps relate to segmentation of the image into fields before 
detecting elements within a field, i.e. before forming layout objects that are 
15 constituted by smaller, separated but interrelated items. Figure 2 shows a sample 
Japanese newspaper. Such newspapers have a specific layout that includes text 
,ines in both horizontal reading direction 22 and vertical reading direction 21 . The 
problem for a traditional bottom-up grouping process of detected connected 
components is that it is not known in which direction the grouping should proceed. 
20 Hence the segmentation is augmented by an additional step of processing the 

background for detecting the fields in the page. Subsequently the reading directon 
for each field of the Japanese paper is detected before the grouping of characters is 

performed. ' 

In an embodiment of the method, separator elements, e.g. black lines 23 tor 

25 separating columns are detected and converted into background elements. With this 
option it is possible to separate large elements of black lines 23 containing vertical 
and horizontal lines that are actually connected into different separator elements. In 
Japanese newspapers, lines are very important objects for separating fields in the 
layout. It is required that these objects are recognized as lines along separation 
30 directions. Without this option, these objects would be classified as graphics. Using 
w the-option-the lines can betreated as separator elements In the different orientations 
separately for each separation direction. 

Figure 3 shows a basic method of merging objects in a single direction. The 
Figure depicts the basic function of the LA module 1 5 for finding the layout objects 
35 oriented in a known direction, such as text blocks for the situation that the reading 
order is known. Connected components 12 are processed in a first, analysis step 31 
by statistical analysis resulting in computed thresholds 32. In a second, classification 
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step 33 the CC-classif ication is corrected resulting in the corr cted connected 
components 34, which are processed in a third, merging step 35 to join characters to 
text lines, resulting in text lines and other objects 36. In a fourth, text merging st p 37 
the text lines are joined to text blocks 38 (and possibly other graphical objects). 
5 According to the requirements for Japanese news papers the traditional joining of 
objects must be along at least two reading directions, and the basic method 
described above must be improved therefor. 

Figure 4 shows segmentation and two directional joining of objects. New 
additional steps have been added compared to the single directional processing in 

10 Figure 3. In a first (pre-) processing step a graph 41 of the image is constructed. The 
construction of the graph by finding field separators is described below. In the graph, 
fields are detected in field detection step 42 by finding areas that are enclosed by 
edges of the graph. The relevant areas are classified as fields containing text blocks 
47. In the text block 47 (using the connected components 43 or corrected connected 

1 5 components 34 that are in the text block area) the reading order 45 is determined in 
step 44. The reading direction detection is based upon the document spectrum, e.g. 
on the method of O'Gorman and Kasturi described in 'Document Image Analysis- 
IEEE Computer Society Press, Los Alamitos, 1 995. Using the fields of the text blocks 
47 T the contained connected components 43 and the reading order 45 as input, the 

20 Line Build step 46 joins the characters to lines as required along the direction found. 

Now the constructing of the graph 41 is described. A graph-representation of 
a document is created using the background of a scan. Pixels in the scan are 
classified as background (usually white) or foreground (usually black). Because only 
large areas of white provide information on fields, small noise objects are removed, 

25 e.g. by down-sampling the image. The down-sampled image may further be de- 
speckled to remove single foreground (black) pixels. 

The next task is to extract the important white areas. The first step Is to detect 
so-called white runs, one pixel high areas of adjacent background pixels. White runs 
that are shorter than a predetermined minimal length are excluded from the 

30 processing. 

Figure 5 shows, as an example, four horizontal runs 51 of white pixels, that 
are adjacent fn vertical direction. Foreground area 53 is assumed to have foreground 
pixels directly surrounding the white runs 51. A "maximal white rectangle" is defined 
as the largest rectangular area that can be constructed from the adjacent white runs 
35 51, hence a rectangular white area that can not be extended without including black 
(foreground) pix Is. A maximal white rectangle 52 is shown based on the four white 
runs 51 having a length as indicated by th vertical dotted lines and a width of 4 
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pixels. When a white rectangle can not be extended it has a so-called maximal 
separating power. Such a rectangle is not a small r part of a more significant white : 
area. Hence the rectangle 52 Is the only possible maximal rectangle of width 4. ■ 
Further rectangles can be constructed of width 3 or 2. A further example is shown In 
6 Figure 6. 

The construction of white rectangles is done separately in different separation 
directions, e.g. horizontal and vertical white rectangles. Vertical white rectangles are 
detected by rotating the image, and detecting horizontal white runs for the rotated 
image. It is noted that depending on the type of image or application also other 

1 0 separation directions may be selected such as diagonal. 

An algorithm for constructing maximal white rectangles is as follows. The 
— input of the algorithm consists of all horizontal one pixel high white runs (WR) 
detected from a given image. Each white run is represented as a rectangle 
characterized by a set of coordinates ((x^yO.^ya)), where and x t and yi are 

1 5 coordinates of its top left comer and X2 f y 2 are the coordinates of its bottom right 

corner. Each white run present in the active ordered object INPUT LIST is tested on 
an extension possibility. The extension possibility Is formulated in the condition 
whether a given WR, labeled by p, can produce a maximal white rectangle (MWR) br 
not. If the extension possibility is FALSE, p is already a maximal one, p is deleted 

20 from the active INPUT LIST and written to the active RESULT LIST. If the extension 
possibility is TRUE, the test for extension is repeated until all MWRs initiated by p 
have been constructed. Then p is deleted from the INPUT LIST and all MWRs 
obtained from p are written to the active RESULT LIST. When all white rectangles 
from the INPUT LIST have been processed, the RESULT LIST will contain all MWRs. 

25 To increase the efficiency of the algorithm, a sort on the y value is applied to the 

INPUT LIST. First, the algorithm is applied for horizontal WRs, i.e. for white runs with 
width larger than height. And after a 90° turn of the image it can be applied to vertical 
WRs. 

In an embodiment the algorithm for constructing the maximal rectangles is as 
30 follows. The rectangle data are stored as a linked list, with, at least, the coordinates 
of the rectangle vertices contained in it. The INPUT* and RESULT LISTs are stored 
as a linked list too, with, at least, three elements, such as the number of white 
rectangles, and pointers on the first and the last element in the Jinked list. The 
following steps are executed: Activate INPUT LIST; Initiate RESULT LIST; Initiate 
35 BUFFER for temporary coordinates of the selected rectangle. Start from the first 

white rectangle, labeled by p 1f out of the active ordered INPUT LIST. The next white 
rectangle on the list is labeled by p 2 . For each whit rectangle on the INPUT LIST 
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examine if p t has extension possibility. For the active white rectangle p lt find the first 
one labeled by p nj , ]=1,... t l, on the active ordered INPUT LIST, which satisfies 
y2(Pi)=yi<Pnj) 

5 x a (p nj )2:Xi(pi) 

This search results in the set {pni,Pn2,».,Pni}- Only if the set {pni.Pn^^Pni} & not 
empty, pi is said to have extension possibility. 

- If p, does not have an extension possibility, then pi is a maximal white rectangle. 
Write pt to the RESULT LIST, and remove pi from the INPUT LIST, and proceed 

10 with p 2 . 

- If pt is extendible, then apply the extension procedure to p,. Proceed with p 2 . We 
note here, that can have an extension possibility while being maximal itself* 

The Extension Procedure is as follows. Suppose pi has an extension possibility, then 
there is the set {Pni,Pn2..<. T Pni}. The extension procedure is applied to each element of 

1 5 {pm,Pn2 Pnil consistently. For the white rectangle p, which is extendible with 

rectangle p nJl ] = 1 1, construct a new rectangle p 1inJ with coordinates: 

Xt(p1.nj) = max { XtfoO, X^Pnj)}, 

Xg(pMj) = min { xa(Pi), x 2 (Pn|) } 
yi(Pt.nj) = yi(pi), 

20 ya(Pl.nj) = Y2(Pnl) 

Write the coordinates of pi.„i, j=1 to the "coordinates" buffer. Repeat the test on 
extension possibility now for p 1>nJ . If the test is TRUE, p^, is maximal. Write p t ,n, to 
the RESULT LIST, otherwise, extend p 1inJ . 

Before applying the extension procedure to pLnj, we check p! and p^ for 
25 absorption effect. The test of p t and p n , for absorption effect with p1 , n j is as follows. By 
absorption effect we mean the situation, in which p, ( p nj ) or both is (are) completely 
contained In p 1n j. In coordinates this means: 

Xi(pi, n j) < X^Pk), 

Xa(Pi. nJ ) ^ X2(Pk), where k=1,n,, H I. 

30 If the condition is TRUE for p 1t then p, is absorbed by p 1iflj . Remove p, from the 

INPUT LIST. If the condition is TRUE for p nj , then p nJ is absorbed by pi in j. Remove p nJ 
from the INPUT LIST. 

The algorithm assumes that the rectangle is wider than it is high, and thus the 
rectangles are primarily horizontal. To construct MWRs in vertical direction, the 
35 original binary image is rotated by 90° clockwise. The algorithm mentioned above is 
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repeated for the rotated image. As a result, all vertical MWRs for the original Image 
are constructed. 

* Figure 6 shows construction of maximal white rectangles. The pixel 
coordinates are displayed along a horizontal x axis and a vertical y axis. Four white 
5 runs 61 are shown left in the Figure. The white runs (WR) are described as 

rectangles with the coordinates of their upper and bottom corners correspondingly: 
WRi : ((10.1), (50,2)). 
WR 2 :<(10,2),(50,3)), 
WR3 : «5,3),(30,4)), 
10 WR4 : ((40,3),(60,4)). 

All maximal white rectangles from these white runs are constructed. The resulting 
... - five maximal-white rectangles(MWR) I are shown in the right part of the.Figure as 
indicated by 62, 63, 64, 65 and 66, The five MWR shown are the complete set of 
MWR for the WR given in the left part of the Figure, A construction algorithm is as 
15 follows. 

Let the INPUT LIST contain the four white runs 61 . The first element from the 
INPUT LIST is WR,((10,1).(50,2)). Label WRi as p v Examine p t on the extension 
possibility as described above. The first candidate for extension is 
WR 2 ((10,2),(50,3)). Label WR 2 as p„i. Extend pi with p n t according to the formula for 

20 extension above, which gives a new rectangle p 1<n i with the coordinates 

((10,1),(50,3)). Test and p n i on the absorption effect with p 1(n1 , As follows from 
absorption test both pi and p n i are absorbed by pi,m. Therefore, delete p t and p n i 
from the INPUT LIST, Proceed with p 1lM . Test p Mi on the extension possibility, which 
gives the first candidate WR 3 ((5,3),(30,4)). Label WR* as Pm Extend pi.„i with p rt 

25 according to the extension formula. As a result, we obtain a new rectangle p(i. n i),ti 
with the coordinates ((1 0,1 ).(30,4)). Test p^t with pn on the absorption effect with 
PanD,ti. The test fails. 

Repeat the test on extension possibilitylor p ( i,ni),ti The test fails, i.e. p ( i in i),ti has no 
extension possibility. It means that P(i. rt1 ).ti is maximal. Write p (1 , n i),ti with the 

30 coordinates ((1 0,1),(30,4)) to the RESULT LIST. 

. .„ Proceed again with Pi, n i and test it on extension possibility. The second candidate 
WR4((40,3),(60,4)) is found. Label WR* as p e . Extend p^, with p e according to the 
extension formula. As a result, we obtain a new rectangle p< 1inl )>t2 with the 
coordinates ((40,1),(50,4)). 

35 Test pi.ni with p, z on the absorption effect with Pa, n i).t2 ■ The test fails, i.e. no 

absorption. Repeat test on extension possibility tor p<i, n i),i2 and the test falls, i.e. 
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P(t,ni),t2 has no extension possibility. It means that p ( i, n i).t2 is maximal. Writ© pp^),* 
with the coordinates ((40 f 1),(50,4)) to the RESULT LIST. 

Test pi,m again on extension possibility. The test fails and p 1>n1 is maximal. Write p 1in1 
with the coordinates ((1 0, 1 ), (50,3)) to the RESULT LIST, 
5 Return to the INPUT UST. The INPUT LIST on this stage contains two write runs, I.e. 
WR 3 : ((5,3),(30,4)) t WR 4 : ((40,3) f (60,4)). Start from WR 3 , and label it as pa. Repeat 
test on extension possibility tor p 2 . The test fails, P2 is maximal Write pa with the 
coordinates ((5,3),(30,4)) to the RESULT LIST. Remove p 2 from the INPUT LIST. 
Proceed with WR 4 and label it as p 3 - Test on extension possibility for p 3 gives us that 

10 p 3 is maximal. Write p 3 wfth the coordinates ((40 r 3),(60,4)) to the RESULT LIST. 
Remove ps from the INPUT LIST, Finally, the RESULT UST contains five maximal 
white rectangles, i.e. MWRi : ((10,1), (50,3)) indicated in Figure 6 as 64, MWR 2 : 
((10,1),(30,4)) indicated as 62, MWR 9 : ((40,1), (50,4)) indicated as 63, and 
MWFU : ((5,3),(30,4)) as 65, MWR 5 : ((40,3),(60,4)) as 66. 

1 5 Figure 7 shows a next step in the method according to the invention, namely 

a cleaning step of overlapping maximal white rectangles. In the cleaning step, plural 
overlapping maximal white rectangles are consolidated into a single so-called 
"Informative Maximal Rectangle" (IWR) that combines the most relevant properties of 
the original maximal white rectangles, as discussed below in detail. 

20 The cleaning may further include steps like checking on size and spatial 

relation. The upper part of Figure 7 shows, as an example, two maximal white 
rectangles MWR1 and MWR2. The pair is consolidated into a single Informative 
White Rectangle IWR in the cleaning step as shown in the lower part of the Figure. 
The process of detecting overlap and consolidating is repeated until no relevant pairs 

26 can be formed anymore, A criterion for forming pairs may be the size of the overlap 
area. 

Further cleaning steps may include removing thin or short rectangles or 
rectangles that have an aspect ratio below a certain predefined value. The criteria for 
removing are based on the type of image, e.g. a width below a predefined number of 
30 pixels indicates a separator of text lines and is not relevant for separating fields, and 
a length below a certain valuejs not relevant in view of the expected sizes of the 
fields. 

An algorithm for the cleaning step is as follows. The start of the cleaning 
procedure is the whole set of MWRs constructed as described above with reference 
35 to Figures 5 and 6. The cleaning procedure is applied to discard non-informative 

MWRs. For this reason a measure of non-informativeness is defined. For exampl a 
long MWR is more informative than a short on . A low aspect ratio indicates a more 
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or less square rectangle that is less informative. Further, extremely thin rectangles, 
which for instance separate two text lines, must be excluded. First, all MWRs are 

. _ classified as being horizontal,-vertical or square.by computing the ration between j 

their heights and widths. Square MWRs are deleted because of their non- J 
5 informativeness. For the remaining horizontal and vertical MWRs the cleaning \ 
technique is applied which consists of three steps: | 

- Each MWR with a length or width below a given value is deleted. 

- Each MWR with aspect ratio (AR), defined as the ratio of the longer side length 
divided by the shorter side length, below a given value is deleted. 

10 - For each pair of overlapping horizontal (or vertical) MWRi ((x^) t (x 2i y2)) and 

horizontal (or vertical) MWR a ((ai,bi),(a 2l b 2 )), an informative white rectangle IWR is 
. constructed with the following.coordinates: _ , ~ „. ? 

(a) Horizontal overlap: 
xi=smin {X!,^}, 

15 ytsmaxfy^bi}, 
x 2 =max{x 2> a 2 }, 
y 2 = min {y 2 , b 2 }. 

(b) Vertical overlap:} 

x , 1 = m^c{Xi, ai} f : 
20 y^minCynbi}, | 
x'2= min {x 2 , ad, 
^2= max { y 2 , b 2 ). 

This process is repeated for all pairs of overlapping MWRs. The set of MWRs now * 
comprises Informative White Rectangles IWRs. These IWRs form the starting point 

25 for an algorithm for segmentation of the image into fields corresponding to the lay-out 
elements. The IWRs are potential field separators and are therefore called 
"separating elements". Using the IWRs, the algorithm constructs a graph for further 
processing into a geographical description of the image. 

Figure S shows such a graph on a newspaper page. The picture shows a 

30 down-sampled digital image 80 of a newspaper page- The original text is visible in 
black In a down-sampled version roiresponding to Figure 2. The informative 
rectangles IWR constituting separating elements are shown in gray. For the 
construction of the graph, intersections of separating elements constituted by 
horizontal and vertical white IWRs are determined. The intersection point of two 

35 IWRs is indicated by a small black square representing a vertex or vertex 81 in the 
graph. Edges 82 that represent lines that s parate the fields in the page are 
constructed by connecting pairs of vertices 81 via "field separators**. The edges 82 of 
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the graph are shown in white. The distance between the two vertices of an edge, f. . 
th I ngth, is assigned as weight to the edge for further processing. In an alternative 
embodiment a different parameter Is used for assigning the weight, e.g. the colour of 
the pixels. An algorithm for constructing the graph is as follows. 



R = {r 1l ,. M r m } be the non-empty and finite set of all IWRs obtained from a given Image 
I, where each IWR is specified by its x- and y- coordinates of top left corner and 
bottom right corner ( (xi (t) , yi (t) ), (Xz W , y 2 w ) ), x = 1,2,..., m respectively. Each 
rectangle r z is classified as horizontal, vertical or square based on the ratio of its 

10 height and width. H « { h lf ~ M h,} f V = { vi,...,vk} f and S =* {si,...,s d } denote the 
subsets of horizontal, vertical and square IWRs, respectively, such that 
H uVuS = R andm^l + k + d, and 
HnV^0, VnS= 0 , HnS = 0 
where it is assumed that 

15 H *0, V*0. 

Further the contents of S are ignored and only the subsets H and V are used. This is 
based on the consideration, that in most cases white spaces that form the border of 
text or non-text blocks are oblong vertical or horizontal areas. Let h be part of H with 
coordinates ((x 1f yi),(^ s y 2 )) and v in V with coordinates ((a^biMa^ba)): Then h and v 

20 have overlap if 



By the intersection point of h and v in case of overlap, we take the unique point P 
defined by the coordinates: 



5 



At the beginning, the following notation and definitions for IWRs is given. Let 



25 



xi*a 2 

yi*b 2 
x 2 &ai 
y 2 abi 



30 




35 



For IWRs only two from all possible types of overlap occur, namely overlap resulting 
in a rectangle and overlap resulting in a point Line overlap cannot occur, because 
this would be in contradiction with the concept of the MWRs, 
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Figure 9 shows two types of intersection of maximal rectangles. For 
constructing the graph the intersection points of vertical and horizontal informative : 
maximal rectangles are determined to find th position of vertices of the graph, i.e. to 
determine the exact coordinates of the vertices. The left part of the Figure shows a 
5 first type of intersection of vertical IWR v and a horizontal IWR h, which results in a 
rectangular area 88 with a center of intersection point P. The right part of the Figure 
shows a second type of intersection of a vertical IWR v and a horizontal IWR h f that 
results in a single intersection point 89 with a center of intersection at P\ 

An algorithm for constructing the graph based on the intersection points is as 

10 follows. 

P = { P1 PN } denotes the set of all intersection points of vertical IWRs and 

— — horizontal IWRs where each p in P is specified by its x- and y- coordinates (Xp, y p ), 

where p~1 ,...,N. Let the set P be found, and G=(X,A) an undirected graph having 

correspondence to P. The graph G=(X f A) consists of a finite number of vertices X 
1 5 which are directly related to the intersection points and a finite number of edges A 

which describe the relation between intersection points. Mathematically this Is 

expressed as 

G(P)-(X(P), A(PxP)), 
20 P:HxV^{x P ,y P }, 
where 

X = {1 ,N}and 

A«<{1,....,N}x{1....,, N))with 



C °°, if iandji 
\ d M , ifiandj 



25 A(i,j)= f °°, if iandj are not 4-chain connected, 

I j are 4-chain connected 



where dy indicates the Euclidean distance between points i and j, and where 4-chain 

connected means that the vertices of a rectangular block are connected in four 
30 possible directions of movement. In the above two points i and j are 4-chain 

connected if they can be reached by walking around with the aid of 4-connected 

chain codes with min dy in one direction. 

The graph as constructed may now be further processed for classifying the 

areas within the graph as text blocks or a similar classification depending on the type 
36 of picture. In an embodiment the graph is augmented by including foreground 

separators, e.g. black lines or patterned lines such as dashed/dotted lines, in the 
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analysis. Also, edges of photos or graphic objects which are detected can be 
Included in the analysis. 

The present segmenting method may also Include a step of removing 
foreground separators. First, foreground separators are recognized and 
5 reconstructed as single objects^ The components that constitute a patterned line are 
connected by analyzing element heuristics, spatial relation heuristics and line 
heuristics, i.e. building a combined element in a direction and detecting if it classifies 
as a line. A further method for reconstructing a solid line from a patterned line is 
down-sampling and/or using the Run Length Smoothing Algorithm (RLSA) as 

10 described by K.Y. Wong, R.G. Casey, F.M. Wahl in "Document analysis system" IBM 
J. Res. Dev 26 (1 982) 647-656. After detecting the foreground separators they are 
replaced by background pixels. The effect is that larger maximal white rectangles can 
be constructed, or supporting any other suitable method using the background pixel 
property for finding background separators. 

15 Figure 10 shows a device for segmenting a picture. The device has an input 

unit 91 for entering a digital image. The input unit may comprise a scanning unit for 1 
scanning an image from physical documents such as an electro-optical scanner, 
and/or a digital communication unit for receiving the image from a network like 
internet, andfor a playback unit for retrieving digital information from a record carrier 

20 like an optical disc drive. The input unit 91 is coupled to a processing unit 94, which 
cooperates with a memory unit 92. The processing unit may comprise a general 
purpose computer central processing unit (CPU) and supporting circuits and 
operates using software for performing the segmentation as described above. The 
processing unit may include a user interface 96 provided with control means such as 

26 a keyboard, a mouse device or operator buttons. The output of the processing unit is 
coupled to a display unit 93. The display unit may comprise a display screen, a 
printing unit for outputting a processed image on paper, and/or a recording unit for 
storing the segmented image on a record carrier like a magnetic tape or optical disk. 
Although the invention has been mainly explained by embodiments using a 

30 Japanese newspaper page as the digital image to be segmented, the Invention is 
also suitable for any digital representation of any text or image having a layout in 
fields on a background, such as electrical circuits in layout images for IC design or 
streets and buildings on city maps. It is noted, that in this document the use of the 
verb 'comprise' and its conjugations does not exclude the presence of other elements 

35 or steps than those listed and the word 8 a' or 4 an' preceding an element does not 

exclude the presence of a plurality of such elements, that any reference signs do not 
limit the scope of the claims, that the invention and every unit or means mentioned 
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may be implemented by suitable hardware and/or software and that several "means* 
or 'units' may be represented by the same item. Further, the scope of the Invention i» 
not limited to the embodiments, and the invention lies in each and every novel 
feature or combination of features described above. 

5 
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CLAIMS 



5 

1 . Method of segmenting a composite image of pixels into a number of fields 
corresponding to lay-out elements of the image, the pixels having a value 
representing the intensity and/or color of a picture element, 
which method comprises 
10 - constructing separating elements corresponding to rectangular areas of adjacent 
pixels of the image, having a background property indicative of a background of the 
image, and 

- constructing a graph representing the lay-out structure of the image by 

- defining vertices of the graph on the basis of intersections of separating 
15 elements that are substantially oriented in predetermined separation 

• directions, in particular horizontal and vertical direction, and 

- defining edges of the graph between the vertices corresponding to the field 
separators, 

- defining field separators corresponding to the edges of the graph. 

20 

2- Method as claimed in claim 1 , wherein the step of defining vertices comprises 

- constructing subsets of separating elements that are substantially oriented jn the 
predetermined separation directions, and 

- determining the intersections between pairs of separating elements from both 
25 subsets. 

3. Method as claimed in claim 2, wherein the step of determining intersections 
comprises determining an area of overlap of the separating elements from both 
subsets, and locating the vertex at the center of the area of overlap, 

30 

4. Method as claimed in claim 1, wherein the graph constructing step comprises 
assigning a weight to the edges indicating the Euclidean distance between the 
vertices, 

35 5. Method as claimed in any of the claims 1 to 4, further comprising 

- constructing a set of maximal rectangles, a maximal rectangle being a rectangular 
part of the image in one of the separation directions, that has the maximum possible 
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area without including a pixel not having the background property indicative of a 
background of the image, and - 

- constructing the separating elements in a cleaning step wherein at least one pair of 
overlapping maximal rectangles in the set is replaced by an informative rectangl that 

5 is a rectangular part of an area combining the areas of the pair, which rectangular 
part has the maximum possible length in the relevant separation direction. 

6. Method as claimed in claim 5, wherein said cleaning step further comprises at 
least one of the following: 
10 - deleting a maximal rectangle having a length below a predefined value; 

- deleting a maximal rectangle having a width below a predefined value; 

- deleting a maximal rectangle having an aspect ratio below a predefined value, the 
aspect ratio being the longer side length divided by the shorter side length. 

15 7- Method as claimed in claim 5 or 6, wherein, prior to said step of constructing 
the maximal rectangles, the image is preprocessed by at least one of the following: 

- removing noise by adapting the value of isolated deviant pixels to the average value 
of pixels in the neighborhood; 

- halftoning by transforming the pixels to either white or black; 
20 - reducing the number of pixels by downsampling; 

8. Method as claimed in claim 5, 6 or 7, wherein, prior to said step of 
constructing the maximal rectangles, the image is filtered by detecting foreground 
separator elements that are objects in the foreground of the image having a pattern 

25 of pixel values deviating from said background property, in particular black lines or 
dashed or dotted lines, and by replacing pixels of the detected foreground separators 
by pixels having the background property. 

9. Method as claimed in any of the claims 5 to 8, wherein constructing the 
30 maximal rectangles comprises 

- determining a list of maximal runs, a maximal run being a straight line of pixels 
having the background property which line has the maximum possible length without 
including a pixel not having the background property, 

- taking a specific maximal run from the list as rectangle, 

35 - testing the rectangle if extension is possible by determining for a next maximal run if. 
the next maximal run comprises pixels adjacent to pixels of the rectangle in the width 
direction, 
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- If the ext nsion is possible extend the rectangle by constructing a new rectangle 
having the maximum area including pixels of the rectangle and the next maximal run, 

- if no extension is possible add the rectangle to the set of maximal rectangles, and 

- eliminating from the list any maximal run that is completely contained in the new 
5 rectangle. 

1 0, Method as claimed in any of the claims 1 to 9 r wherein the step of 
constructing the separating elements comprises processing the image in two 
orthogonal separation directions* 

10 

1 1 „ Method as claimed in any of the claims 1 to 10, wherein the step of 
constructing the separating elements comprises detecting graphical elements that 
are objects in the foreground of the image having a pattern of pixel values deviating 
from said background property, and separating elements are constructed around the 
15 graphical elements. 



12. Method as claimed in claim 1 , wherein at least one of the fields is classified 
as text field, and a reading order is detected in the text field, and foreground 
components are joined to text lines in the text field in said reading order. 

20 

1 3. Computer program product for segmenting an image of pixels into a number 
of fields, which program is operative to cause a processor to perform the method as 
claimed in any of the claims 1 to 12. 



25 14. Device for segmenting a composite image of pixels into a number of fields 
corresponding to lay-out elements of the image, the pixels having a value 
representing the Intensity and/or color of a picture element, which device comprises 

- an input unit (91 ) for inputting an image, and 

- a processing unit (94) for constructing a graph representing the lay-out structure of 
30 the image by 

- constructing separating elements corresponding to rectangular areas of adjacent 
pixels having a background property indicative of a background of the image, 

- defining vertices of the graph based on intersections of separating elements that 
are substantially oriented in different separation directions, in particular horizontal 

35 and vertical direction, and 

- defining edges of the graph between the vertices corresponding to the separating 
elements. 
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1 e . Device as claimed in claim 1 4, whBrein the device comprises a display unit 
(93) for i 
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ABSTRACT 



5 A method is described for segmenting an image of pixels into a .number of fields. A 
graph is constructed for representing the image. First, separating elements are 
constructed that are oblong areas of adjacent pixels having a background property 
indicative of a background of the image. Then vertices of the graph are defined 
based on intersections of separating elements that are substantially oriented In 
1 0 different separation directions, in particular horizontal and vertical direction, and 
edges of the graph are defined between the vertices corresponding to the field 
separators. 

Finally, the edges of the graph are interpreted as lines that separate the fields. 

15 



(Fig. 8) 
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